System and method for three-dimensional object reconstruction from two-dimensional images

ABSTRACT

A system and method for three-dimensional acquisition and modeling of a scene using two-dimensional images are provided. The present disclosure provides a system and method for selecting and combining the three-dimensional acquisition techniques that best fit the capture environment and conditions under consideration, and hence produce more accurate three-dimensional models. The system and method provide for acquiring at least two two-dimensional images of a scene, applying a first depth acquisition function to the at least two two-dimensional images, applying a second depth acquisition function to the at least two two-dimensional images, combining an output of the first depth acquisition function with an output of the second depth acquisition function, and generating a disparity or depth map from the combined output. The system and method also provide for reconstructing a three-dimensional model of the scene from the generated disparity or depth map.

TECHNICAL FIELD OF THE INVENTION

The present disclosure generally relates to three-dimensional objectmodeling, and more particularly, to a system and method forthree-dimensional (3D) information acquisition from two-dimensional (2D)images that combines multiple 3D acquisition functions for the accuraterecovery of 3D information of real world scenes.

BACKGROUND OF THE INVENTION

When a scene is filmed, the resulting video sequence contains implicitinformation on the three-dimensional (3D) geometry of the scene. Whilefor adequate human perception this implicit information suffices, formany applications the exact geometry of the 3D scene is required. Onecategory of these applications is when sophisticated data processingtechniques are used, for instance in the generation of new views of thescene, or in the reconstruction of the 3D geometry for industrialinspection applications.

The process of generating 3D models from single or multiple images isimportant for many film post-production applications. Recovering 3Dinformation has been an active research area for some time. There are alarge number of techniques in the literature that either captures 3Dinformation directly, for example, using a laser range finder, orrecovers 3D information from one or multiple two-dimensional (2D) imagessuch as stereo or structure from motion techniques. 3D acquisitiontechniques in general can be classified as active and passiveapproaches, single view and multi-view approaches, and geometric andphotometric methods.

Passive approaches acquire 3D geometry from images or videos taken underregular lighting conditions. 3D geometry is computed using the geometricor photometric features extracted from images and videos. Activeapproaches use special light sources, such as laser, structured light orinfrared light. Active approaches compute the geometry based on theresponse of the objects and scenes to the special light projected ontothe surface of the objects and scenes.

Single-view approaches recover 3D geometry using multiple images takenfrom a single camera viewpoint. Examples include structure from motionand depth from defocus.

Multi-view approaches recover 3D geometry from multiple images takenfrom multiple camera viewpoints, resulted from object motion, or withdifferent light source positions. Stereo matching is an example ofmulti-view 3D recovery by matching the pixels in the left image andright image in the stereo pair to obtain the depth information of thepixels.

Geometric methods recover 3D geometry by detecting geometric featuressuch as corners, edges, lines or contours in single or multiple images.The spatial relationship among the extracted corners, edges, lines orcontours can be used to infer the 3D coordinates of the pixels inimages. Structure From Motion (SFM) is a technique that attempts toreconstruct the 3D structure of a scene from a sequence of images takenfrom a camera moving within the scene or a static camera and a movingobject. Although many agree that SFM is fundamentally a nonlinearproblem, several attempts at representing it linearly have been madethat provide mathematical elegance as well as direct solution methods.On the other hand, nonlinear techniques require iterative optimization,and must contend with local minima. However, these techniques promisegood numerical accuracy and flexibility. The advantage of SFM over thestereo matching is that one camera is needed. Feature based approachescan be made more effective by tracking techniques, which exploits thepast history of the features' motion to predict disparities in the nextframe. Second, due to small spatial and temporal differences between 2consecutive frames, the correspondence problem can be also cast as aproblem of estimating the apparent motion of the image brightnesspattern, called the optical flow. There are several algorithms that useSFM; most of them are based on the reconstruction of 3D geometry from 2Dimages. Some assume known correspondence values, and others usestatistical approaches to reconstruct without correspondence.

Photometric methods recover 3D geometry based on the shading or shadowof the image patches resulting from the orientation of the scenesurface.

The above-described methods have been extensively studied for decades.However, no single technique performs well in all situations and most ofthe past methods focus on 3D reconstruction under laboratory conditions,which make the reconstruction relatively easy. For real-world scenes,subjects could be in movement, lighting may be complicated, and depthrange could be large. It is difficult for the above-identifiedtechniques to handle these real-world conditions. For instance, if thereis a large depth discontinuity between the foreground and backgroundobjects, the search range of stereo matching has to be significantlyincreased, which could result in unacceptable computational costs, andadditional depth estimation errors.

SUMMARY

A system and method for three-dimensional (3D) acquisition and modelingof a scene using two-dimensional (2D) images are provided. The presentdisclosure provides a system and method for selecting and combining the3D acquisition techniques that best fit the capture environment andconditions under consideration, and hence produce more accurate 3Dmodels. The techniques used depend on the scene under consideration. Forexample, in outdoor scenes stereo passive techniques would be used incombination with structure from motion. In other cases, activetechniques may be more appropriate. Combining multiple 3D acquisitionfunctions result in higher accuracy than if only one technique orfunction was used. The results of the multiple 3D acquisition functionswill be combined to obtain a disparity or depth map which can be used togenerate a complete 3D model. The target application of this work is 3Dreconstruction of film sets. The resulting 3D models can be used forvisualization during the film shooting or for postproduction.

Other applications will benefit from this approach including but notlimited to gaming and 3D TV that employs a 2D+depth format.

According to one aspect of the present disclosure, a three-dimensional(3D) acquisition method is provided. The method includes acquiring atleast two two-dimensional (2D) images of a scene; applying a first depthacquisition function to the at least two 2D images; applying a seconddepth acquisition function to the at least two 2D images; combining anoutput of the first depth acquisition function with an output of thesecond depth acquisition function; and generating a disparity map fromthe combined output of the first and second depth acquisition functions.

In another aspect, the method further includes generating a depth mapfrom the disparity map.

In a further aspect, the method includes reconstructing athree-dimensional model of the scene from the generated disparity ordepth map.

According to another aspect of the present disclosure, a system forthree-dimensional (3D) information acquisition from two-dimensional (2D)images includes means for acquiring at least two two-dimensional (2D)images of a scene; and a 3D acquisition module configured for applying afirst depth acquisition function to the at least two 2D images, applyinga second depth acquisition function to the at least two 2D images andcombining an output of the first depth acquisition function with anoutput of the second depth acquisition function. The 3D acquisitionmodule is further configured for generating a disparity map from thecombined output of first and second depth acquisition functions.

According to a further aspect of the present disclosure, a programstorage device readable by a machine, tangibly embodying a program ofinstructions executable by the machine to perform method steps foracquiring three-dimensional (3D) information from two-dimensional (2D)images is provided, the method including acquiring at least twotwo-dimensional (2D) images of a scene; applying a first depthacquisition function to the at least two 2D images; applying a seconddepth acquisition function to the at least two 2D images; combining anoutput of the first depth acquisition function with an output of thesecond depth acquisition function; and generating a disparity map fromthe combined output of the first and second depth acquisition functions.

BRIEF DESCRIPTION OF THE DRAWINGS

These, and other aspects, features and advantages of the presentdisclosure will be described or become apparent from the followingdetailed description of the preferred embodiments, which is to be readin connection with the accompanying drawings.

In the drawings, wherein like reference numerals denote similar elementsthroughout the views:

FIG. 1 is an illustration of an exemplary system for three-dimensional(3D) depth information acquisition according to an aspect of the presentdisclosure;

FIG. 2 is a flow diagram of an exemplary method for reconstructingthree-dimensional (3D) objects or scenes from two-dimensional (2D)images according to an aspect of the present disclosure;

FIG. 3 is a flow diagram of an exemplary two-pass method for 3D depthinformation acquisition according to an aspect of the presentdisclosure;

FIG. 4A illustrates two input stereo images and FIG. 4B illustrates twoinput structured light images;

FIG. 5A is a disparity map generated from the stereo images shown inFIG. 4B;

FIG. 5B is a disparity map generated from the structured light imagesshown in FIG. 4A;

FIG. 5C is a disparity map resulting from the combination of thedisparity maps shown in FIGS. 5A and 5B using a simple averagecombination method; and

FIG. 5D is a disparity map resulting from the combination of thedisparity maps shown in FIGS. 5A and 5B using a weighted averagecombination method.

It should be understood that the drawing(s) is for purposes ofillustrating the concepts of the disclosure and is not necessarily theonly possible configuration for illustrating the disclosure.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

It should be understood that the elements shown in the FIGS. may beimplemented in various forms of hardware, software or combinationsthereof. Preferably, these elements are implemented in a combination ofhardware and software on one or more appropriately programmedgeneral-purpose devices, which may include a processor, memory andinput/output interfaces.

The present description illustrates the principles of the presentdisclosure. It will thus be appreciated that those skilled in the artwill be able to devise various arrangements that, although notexplicitly described or shown herein, embody the principles of thedisclosure and are included within its spirit and scope.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the disclosure and the concepts contributed by the inventor tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, andembodiments of the disclosure, as well as specific examples thereof, areintended to encompass both structural and functional equivalentsthereof. Additionally, it is intended that such equivalents include bothcurrently known equivalents as well as equivalents developed in thefuture, i.e., any elements developed that perform the same function,regardless of structure.

Thus, for example, it will be appreciated by those skilled in the artthat the block diagrams presented herein represent conceptual views ofillustrative circuitry embodying the principles of the disclosure.Similarly, it will be appreciated that any flow charts, flow diagrams,state transition diagrams, pseudocode, and the like represent variousprocesses which may be substantially represented in computer readablemedia and so executed by a computer or processor, whether or not suchcomputer or processor is explicitly shown.

The functions of the various elements shown in the figures may beprovided through the use of dedicated hardware as well as hardwarecapable of executing software in association with appropriate software.When provided by a processor, the functions may be provided by a singlededicated processor, by a single shared processor, or by a plurality ofindividual processors, some of which may be shared. Moreover, explicituse of the term “processor” or “controller” should not be construed torefer exclusively to hardware capable of executing software, and mayimplicitly include, without limitation, digital signal processor (“DSP”)hardware, read only memory (“ROM”) for storing software, random accessmemory (“RAM”), and nonvolatile storage.

Other hardware, conventional and/or custom, may also be included.Similarly, any switches shown in the figures are conceptual only. Theirfunction may be carried out through the operation of program logic,through dedicated logic, through the interaction of program control anddedicated logic, or even manually, the particular technique beingselectable by the implementer as more specifically understood from thecontext.

In the claims hereof, any element expressed as a means for performing aspecified function is intended to encompass any way of performing thatfunction including, for example, a) a combination of circuit elementsthat performs that function or b) software in any form, including,therefore, firmware, microcode or the like, combined with appropriatecircuitry for executing that software to perform the function. Thedisclosure as defined by such claims resides in the fact that thefunctionalities provided by the various recited means are combined andbrought together in the manner which the claims call for. It is thusregarded that any means that can provide those functionalities areequivalent to those shown herein.

The techniques disclosed in the present disclosure deal with the problemof recovering 3D geometries of objects and scenes. Recovering thegeometry of real-world scenes is a challenging problem due to themovement of subjects, large depth discontinuity between foreground andbackground, and complicated lighting conditions. Fully recovering thecomplete geometry of a scene using one technique is computationallyexpensive and unreliable. Some of the techniques for accurate 3Dacquisition, such as laser scan, are unacceptable in many situations dueto the presence of human subjects. The present disclosure provides asystem and method for selecting and combining the 3D acquisitiontechniques that best fit the capture environment and conditions underconsideration, and hence produce more accurate 3D models.

A system and method for combining multiple 3D acquisition methods forthe accurate recovery of 3D information of real world scenes areprovided. Combining multiple methods is motivated by the lack of asingle method capable of capturing 3D information for real and largeenvironments reliably. Some methods work well indoors but not outdoors,others require a static scene. Also computation complexity/accuracyvaries substantially between various methods. The system and method ofpresent disclosure defines a framework for capturing 3D information thattakes advantage of the strengths of available techniques to obtain thebest 3D information. The system and method of the present disclosureprovides for acquiring at least two two-dimensional (2D) images of ascene; applying a first depth acquisition function to the at least two2D images; applying a second depth acquisition function to the at leasttwo 2D images; combining an output of the first depth acquisitionfunction with an output of the second depth acquisition function; andgenerating a disparity map from the combined output of the first andsecond depth acquisition functions. Since disparity information isinversely proportional to depth multiplied by a scaling factor, adisparity map or a depth map generated from the combined output may beused to reconstruct 3D objects or scene.

Referring now to the Figures, exemplary system components according toan embodiment of the present disclosure are shown in FIG. 1. A scanningdevice 103 may be provided for scanning film prints 104, e.g.,camera-original film negatives, into a digital format, e.g.Cineon-format or Society of Motion Picture and Television Engineers(SMPTE) Digital Picture Exchange (DPX) files. The scanning device 103may comprise, e.g., a telecine or any device that will generate a videooutput from film such as, e.g., an Arri LocPro™ with video output.Digital images or a digital video file may be acquired by capturing atemporal sequence of video images with a digital video camera 105.Alternatively, files from the post production process or digital cinema106 (e.g., files already in computer-readable form) can be useddirectly. Potential sources of computer-readable files are AVID™editors, DPX files, D5 tapes etc.

Scanned film prints are input to a post-processing device 102, e.g., acomputer. The computer is implemented on any of the various knowncomputer platforms having hardware such as one or more centralprocessing units (CPU), memory 110 such as random access memory (RAM)and/or read only memory (ROM) and input/output (I/O) user interface(s)112 such as a keyboard, cursor control device (e.g., a mouse orjoystick) and display device. The computer platform also includes anoperating system and micro instruction code. The various processes andfunctions described herein may either be part of the micro instructioncode or part of a software application program (or a combinationthereof) which is executed via the operating system. In one embodiment,the software application program is tangibly embodied on a programstorage device, which may be uploaded to and executed by any suitablemachine such as post-processing device 102. In addition, various otherperipheral devices may be connected to the computer platform by variousinterfaces and bus structures, such a parallel port, serial port oruniversal serial bus (USB). Other peripheral devices may includeadditional storage devices 124 and a printer 128. The printer 128 may beemployed for printed a revised version of the film 126 wherein scenesmay have been altered or replaced using 3D modeled objects as a resultof the techniques described below.

Alternatively, files/film prints already in computer-readable form 106(e.g., digital cinema, which for example, may be stored on external harddrive 124) may be directly input into the computer 102. Note that theterm “film” used herein may refer to either film prints or digitalcinema.

A software program includes a three-dimensional (3D) reconstructionmodule 114 stored in the memory 110. The 3D reconstruction module 114includes a 3D acquisition module 116 for acquiring 3D information fromimages. The 3D acquisition module 116 includes several 3D acquisitionfunctions 116-1 . . . 116-n such as, but not limited to, a stereomatching function, a structured light function, structure from motionfunction, and the like.

A depth adjuster 117 is provided for adjusting the depth scales of thedisparity or depth map generated from the different acquisition methods.The depth adjuster 117 scales the depth value of the pixels in thedisparity or depth maps to 0-255 for each method.

A reliability estimator 118 is provided and configured for estimatingthe reliability of depth values for the image pixels. The reliabilityestimator 118 compares the depth values of each method. If the valuesfrom the various functions or methods are close or within apredetermined range, the depth value is considered reliable; otherwise,the depth value is not reliable.

The 3D reconstruction module 114 also includes a feature point detector119 for detecting feature points in an image. The feature point detector119 will include at least one feature point detection function, e.g.,algorithms, for detecting or selecting feature points to be employed toregister disparity maps. A depth map generator 120 is also provided forgenerating a depth map from the combined depth information.

FIG. 2 is a flow diagram of an exemplary method for reconstructingthree-dimensional (3D) objects from two-dimensional (2D) imagesaccording to an aspect of the present disclosure.

Referring to FIG. 2, initially, in step 202, the post-processing device102 obtains the digital master video file in a computer-readable format.The digital video file may be acquired by capturing a temporal sequenceof video images with a digital video camera 105. Alternatively, aconventional film-type camera may capture the video sequence. In thisscenario, the film is scanned via scanning device 103 and the processproceeds to step 204. The camera will acquire 2D images while movingeither the object in a scene or the camera. The camera will acquiremultiple viewpoints of the scene.

It is to be appreciated that whether the film is scanned or already indigital format, the digital file of the film will include indications orinformation on locations of the frames (i.e. timecode), e.g., a framenumber, time from start of the film, etc. Each frame of the digitalvideo file will include one image, e.g., I₁, I₂, . . . I_(n).

Combining multiple methods creates the need for new techniques toregister the output of each method in a common coordinate system. Theregistration process can complicate the combination processsignificantly. In the method of the present disclosure, input imagesource information can be collected, at step 204, at the same time foreach method. This simplifies registration since camera position at step206 and camera parameters at step 208 are the same for all techniques.However, the input image source can be different for each 3D capturemethods used. For example, if stereo matching is used the input imagesource should be two cameras separated by an appropriate distance. Inanother example, if structured light is used the input image source isone or more images of structured light illuminated scenes. Preferably,the input image source to each function is aligned so that theregistration of the functions' outputs is simple and straightforward.Otherwise manual or automatic registration techniques are implemented toalign, at step 210, the input image sources.

In step 212, an operator via user interface 112 selects at least two 3Dacquisitions functions. The 3D acquisition functions used depend on thescene under consideration. For example, in outdoor scenes stereo passivetechniques would be used in combination with structure from motion. Inother cases, active techniques may be more appropriate. In anotherexample, a structured light function may be combined with a laser rangefinder function for a static scene. In a third example, more than twocameras can be used in an indoor scene by combining a shape fromsilhouette function and a stereo matching function.

A first 3D acquisition function is applied to the images in step 214 andfirst depth data is generated for the images in step 216. A second 3Dacquisition function is applied to the images in step 218 and seconddepth data is generated for the images in step 220. It is to beappreciated that steps 214 and 216 may be performed concurrently orsimultaneously with steps 218 and 220. Alternatively, each 3Dacquisition function may be performed separately, stored in memory andretrieved at a later time for the combining step as will be describedbelow.

In step 222, the output of each 3D depth acquisition function isregistered and combined. If the image sources are properly aligned, noregistration is needed and the depth values can be combined efficiently.If the image sources are not aligned, the resulting disparity maps needto be aligned properly. This can be done manually or by matching afeature (e.g. marker, corner, edge) from one image to the other imagevia the feature point detector 119 and then shifting one of thedisparity maps accordingly. Feature points are the salient features ofan image, such as corners, edges, lines or the like, where there is ahigh amount of image intensity contrast. The feature point detector 119may use a Kitchen-Rosenfeld corner detection operator C, as is wellknown in the art. This operator is used to evaluate the degree of“cornerness” of the image at a given pixel location. “Corners” aregenerally image features characterized by the intersection of twodirections of image intensity gradient maxima, for example at a 90degree angle. To extract feature points, the Kitchen-Rosenfeld operatoris applied at each valid pixel position of image I₁. The higher thevalue of the operator C at a particular pixel, the higher its degree of“cornerness”, and the pixel position (x,y) in image I₁ is a featurepoint if C at (x,y) is greater than at other pixel positions in aneighborhood around (x,y). The neighborhood may be a 5×5 matrix centeredon the pixel position (x,y). To assure robustness, the selected featurepoints may have a degree of cornerness greater than a threshold, such asT_(c)=10. The output from the feature point detector 118 is a set offeature points {F₁} in image I₁ where each F₁ corresponds to a “feature”pixel position in image I₁. Many other feature point detectors can beemployed including but not limited to Scale-Invariant Feature Transform(SIFT), Smallest Univalue Segment Assimilating Nucleus (SUSAN), Houghtransform, Sobel edge operator and Canny edge detector: After thedetected feature points are chosen, a second image I₂ is processed bythe feature point detector 119 to detect the features found in the firstimage I₁ and match the features to align the images.

One of the remaining registration issues is to adjust the depth scalesof the disparity map generated from the different 3D acquisitionmethods. This could be done automatically since a constantmultiplicative factor can be fitted to the depth data available for thesame pixels or points in the scene. For example, the minimum valueoutput from each method can be scaled to 0 and the maximum value outputfrom each method can be scaled to 255.

Combining the results of the various 3D depth acquisition functionsdepend on many factors. Some functions or algorithms, for example,produce sparse depth data where many pixels have no depth information.Therefore, the function combination relies on other functions. Ifmultiple functions produced depth data at a pixel, the data may becombined by taking the average of estimated depth data. A simplecombination method combines the two disparity maps by averaging thedisparity values from the two disparity maps for each pixel.

Weights could be assigned to each function based on operator confidencein the function results before combining the results, e.g., based on thecapture conditions (e.g., indoors, outdoors, lighting conditions) orbased on the local visual features of the pixels. For instance,stereo-based approaches in general are inaccurate for the regionswithout texture, while structured light based methods could perform verywell. Therefore, more weight can be assigned to the structured lightbased method by detecting the texture features of the local regions. Inanother example, the structured light method usually performs poorly fordark areas, while the performance of stereo matching remains reasonablygood. Therefore, in this example, more weight can be assigned to thestereo matching technique.

The weighted combination method calculates the weighted average of thedisparity values from the two disparity maps. The weight is determinedby the intensity value of the corresponding pixel in the left-eye imageof a corresponding pixel pair between the left eye and right eye images,e.g., a stereoscopic pair. If the intensity value is large, a largeweight is assigned to the structured light disparity map; otherwise, alarge weight is assigned to the stereo disparity map. Mathematically,the resulting disparity value is

D(x,y)=w(x,y)Dl(x,y)+(1−w(x,y))Ds(x,y),

w(x,y)=g(x,y)/C

where Dl is the disparity map from structured light, Ds is the disparitymap from stereo, D is the combined disparity map, g(x,y) is theintensity value of the pixel at (x,y) on the left-eye image and C is anormalization factor to normalize the weights to the range from 0 to 1.For example, for 8 bit color depth, C should be 255.

Using the system and method of the present disclosure, multiple depthestimates are available for the same pixel or point in the scene, onefor each 3D acquisition method used. Therefore, the system and methodcan also estimate the reliability of the depth values for the imagepixels. For example, if all the 3D acquisition methods output verysimilar depth values for one pixel, e.g., within a predetermined range,then, that depth value can be considered as very reliable. The oppositeshould happen when the depth values obtained by the different 3Dacquisition methods differ vastly.

The combined disparity map may then be converted into a depth map atstep 224. Disparity is inversely related to depth with a scaling factorrelated to camera calibration parameters. Camera calibration parametersare obtained and are employed by the depth map generator 122 togenerator a depth map for the object or scene between the two images.The camera parameters include but are not limited to the focal length ofthe camera and the distance between the two camera shots. The cameraparameters may be manually entered into the system 100 via userinterface 112 or estimated from camera calibration algorithms orfunctions. Using the camera parameters, the depth map is generated fromthe combined output of the multiple 3D acquisition functions. A depthmap is a two-dimension array of values for mathematically representing asurface in space, where the rows and columns of the array correspond tothe x and y location information of the surface; and the array elementsare depth or distance readings to the surface from a given point orcamera location. A depth map can be viewed as a grey scale image of anobject, with the depth information replacing the intensity information,or pixels, at each point on the surface of the object. Accordingly,surface points are also referred to as pixels within the technology of3D graphical construction, and the two terms will be usedinterchangeably within this disclosure. Since disparity information isinversely proportional to depth multiplied by a scaling factor,disparity information can be used directly for building the 3D scenemodel for most applications. This simplifies the computation since itmakes computation of camera parameters unnecessary.

A complete 3D model of an object or a scene can be reconstructed fromthe disparity or depth map. The 3D models can then be used for a numberof applications such as postproduction application and creating 3Dcontent from 2D. The resulting combined image can be visualized usingconventional visualization tools such as the ScanAlyze softwaredeveloped at Stanford University of Stanford, Calif.

The reconstructed 3D model of a particular object or scene may then berendered for viewing on a display device or saved in a digital file 130separate from the file containing the images. The digital file of 3Dreconstruction 130 may be stored in storage device 124 for laterretrieval, e.g., during an editing stage of the film where a modeledobject may be inserted into a scene where the object was not previouslypresent.

Other conventional systems use a two-pass approach to recover thegeometry of the static background and dynamic foreground separately.Once the background geometry is acquired, e.g., a static source, it canbe used as a priori information to acquire the 3D geometry of movingsubjects, e.g. a dynamic source. This conventional method can reducecomputational cost and increases reconstruction accuracy by restrictingthe computation within Regions-of-Interest. However, it has beenobserved that the use of single technique for recovering 3D informationin each pass is not sufficient. Therefore, in another embodiment, themethod of the present disclosure employing multiple depth techniques isused in each pass of a two-pass approach. FIG. 3 illustrates anexemplary method that combines the results from stereo and structuredlight to recover the geometry of static scenes, e.g., background scenes,and 2D-3D conversion and structure from motion for dynamic scenes, e.g.,foreground scenes. The steps shown in FIG. 3 are similar to the stepsdescribed in relation to FIG. 2 and therefore, have similar referencenumerals where the -1 steps, e.g., 304-1, represents steps in the firstpass and -2 steps, e.g., 304-2, represents the steps in the second pass.For example, a static input source is provided in step 304-1. A first 3Dacquisition function is performed at step 314-1 and depth data isgenerated at step 316-1. A second 3D acquisition function is performedat step 318-1, depth data generated at step 320-1 and the depth datafrom the two 3D acquisition functions is combined in step 322-1 and astatic disparity or depth map is generated in step 324-1. Similarly, adynamic disparity or depth map is generated by steps 304-2 through322-2. In step 326, a combined disparity or depth map is generated fromthe static disparity or depth map from the first pass and the dynamicdisparity or depth map from the second pass. It is to be appreciatedthat FIG. 3 is just one possible example, and other algorithms and/orfunctions may be used and combined, as needed.

Images processed by the system and method of the present disclosure areillustrated in FIGS. 4A-B where FIG. 4A illustrates two input stereoimages and FIG. 4B illustrates two input structured light images. Incollecting the images, each method had different requirements. Forexample, structure light requires darker room settings as compared tostereo. Also different camera modes were used for each method. A singlecamera (e.g., a consumer grade digital camera) was used to capture theleft and right stereo images by moving the camera in a slider, so thatthe camera conditions are identical for the left and right images. Forstructured light, a nightshot exposure was used, so that the color ofthe structured light has minimum distortion. For stereo matching, aregular automatic exposure was used since it's less sensitive tolighting environment settings. The structured lights were generated by adigital projector. Structured light images are taken in a dark roomsetting with all lights turned off except for the projector. Stereoimages are taken with regular lighting conditions. During capture, theleft-eye camera position was kept exactly the same for structured lightand stereo matching (but the right-eye camera position can be varied),so the same reference image is used for aligning the structured lightdisparity map and stereo disparity map in combination.

FIG. 5A is a disparity map generated from the stereo images shown inFIG. 4A and FIG. 5B is a disparity map generated from the structuredlight images shown in FIG. 4B. FIG. 5C is a disparity map resulting fromthe combination of the disparity maps shown in FIGS. 5A and 5B using asimple average combination method; and FIG. 5D is a disparity mapresulting from the combination of the disparity maps shown in FIGS. 5Aand 5B using a weighted average combination method. In FIG. 5A, it isobserved that the stereo function did not provide good depth mapestimation to the box on the right. On the other hand, structured lightin FIG. 5B had difficulty identifying the black chair. Although thesimple combination method provided some improvement in FIG. 5C, it didnot capture the chair boundaries well. The weighted combination methodprovides the best depth map results with the main objects (i.e., chair,boxes) clearly identified, as shown in FIG. 5D.

Although the embodiments which incorporate the teachings of the presentdisclosure has been shown and described in detail herein, those skilledin the art can readily devise many other varied embodiments that stillincorporate these teachings. Having described preferred embodiments fora system and method for three-dimensional (3D) acquisition and modelingof a scene (which are intended to be illustrative and not limiting), itis noted that modifications and variations can be made by personsskilled in the art in view of the above teachings. It is therefore to beunderstood that changes may be made in the particular embodiments of thepresent disclosure which are within the scope of the disclosure as setforth in the appended claims.

1. A three-dimensional acquisition method comprising: acquiring at leasttwo two-dimensional images of a scene; applying a first depthacquisition function to the at least two two-dimensional images;applying a second depth acquisition function to the at least twotwo-dimensional images; combining an output of the first depthacquisition function with an output of the second depth acquisitionfunction; and generating a disparity map from the combined output of thefirst and second depth acquisition functions.
 2. The method of claim 1,further comprising generating a depth map from the disparity map.
 3. Themethod of claim 1, wherein the combining step includes registering theoutput of the first depth acquisition function to the output of thesecond depth acquisition function.
 4. The method of claim 3, wherein theregistering step includes adjusting the depth scales of the output ofthe first depth acquisition function and the output of the second depthacquisition function.
 5. The method of claim 1, wherein the combiningstep includes averaging the output of the first depth acquisitionfunction with the output of the second depth acquisition function. 6.The method of claim 1, furthering comprising: applying a first weightedvalue to the output of the first depth acquisition function and a secondweighted value to the output of the second depth acquisition function.7. The method of claim 6, wherein the at least two two-dimensionalimages include a left eye view and a right eye view of a stereoscopicpair and the first weighted value is determined by an intensity of apixel in the left eye image of a corresponding pixel pair between theleft eye and right eye images.
 8. The method of claim 1, furthercomprising reconstructing a three-dimensional model of the scene fromthe generated disparity map.
 9. The method of claim 1, furthercomprising aligning the at least two two-dimensional images.
 10. Themethod of claim 9, wherein the aligning step further includes matching afeature between the at least two two-dimensional images.
 11. The methodof claim 1, further comprising: applying at least a third depthacquisition function to the at least two two-dimensional images;applying at least a fourth depth acquisition function to the at leasttwo two-dimensional images; combining an output of the third depthacquisition function with an output of the fourth depth acquisitionfunction; generating a second disparity map from the combined output ofthe third and fourth depth acquisition functions; and combining thegenerated disparity map from the combined output of the first and seconddepth acquisition functions with the second disparity map from thecombined output of the third and fourth depth acquisition functions. 12.A system for three-dimensional information acquisition fromtwo-dimensional images, the system comprising: means for acquiring atleast two two-dimensional images of a scene; and a three-dimensionalacquisition module configured for applying a first depth acquisitionfunction to the at least two two-dimensional images, applying a seconddepth acquisition function to the at least two two-dimensional imagesand combining an output of the first depth acquisition function with anoutput of the second depth acquisition function.
 13. The system of claim12, further comprising a depth map generator configured for generating adepth map from the combined output of the first and second depthacquisition functions.
 14. The system of claim 12, wherein thethree-dimensional acquisition module is further configured forgenerating a disparity map from the combined output of first and seconddepth acquisition functions.
 15. The system of claim 12, wherein thethree-dimensional acquisition module is further configured forregistering the output of the first depth acquisition function to theoutput of the second depth acquisition function.
 16. The system of claim15, further comprising a depth adjuster configured for adjusting thedepth scales of the output of the first depth acquisition function andthe output of the second depth acquisition function.
 17. The system ofclaim 12, wherein the three-dimensional acquisition module is furtherconfigured for averaging the output of the first depth acquisitionfunction with the output of the second depth acquisition function. 18.The system of claim 12, wherein the three-dimensional acquisition moduleis further configured for applying a first weighted value to the outputof the first depth acquisition function and a second weighted value tothe output of the second depth acquisition function.
 19. The system ofclaim 18, wherein the at least two two-dimensional images include a lefteye view and a right eye view of a stereoscopic pair and the firstweighted value is determined by an intensity of a pixel in the left eyeimage of a corresponding pixel pair between the left eye and right eyeimages.
 20. The system of claim 14, further comprising athree-dimensional reconstruction module configured for reconstructing athree-dimensional model of the scene from the generated depth map. 21.The system of claim 12, wherein the three-dimensional acquisition moduleis further configured for aligning the at least two two-dimensionalimages.
 22. The system of claim 21, further comprising a feature pointdetector configured for matching a feature between the at least twotwo-dimensional images.
 23. The system of claim 12, wherein thethree-dimensional acquisition module is further configured for applyingat least a third depth acquisition function to the at least twotwo-dimensional images, applying at least a fourth depth acquisitionfunction to the at least two two-dimensional images; combining an outputof the third depth acquisition function with an output of the fourthdepth acquisition function and combining the combined output of thefirst and second depth acquisition functions with the combined output ofthe third and fourth depth acquisition functions.
 24. A program storagedevice readable by a machine, tangibly embodying a program ofinstructions executable by the machine to perform method steps foracquiring three-dimensional information from two-dimensional images, themethod comprising the steps of: acquiring at least two two-dimensionalimages of a scene; applying a first depth acquisition function to the atleast two two-dimensional images; applying a second depth acquisitionfunction to the at least two two-dimensional images; combining an outputof the first depth acquisition function with an output of the seconddepth acquisition function; and generating a disparity map from thecombined output of the first and second depth acquisition functions.